You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "waitingF (via GitHub)" <gi...@apache.org> on 2023/03/20 17:32:34 UTC

[GitHub] [hudi] waitingF opened a new pull request, #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

waitingF opened a new pull request, #8247:
URL: https://github.com/apache/hudi/pull/8247

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?  yes
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org. sent email please help involve me in
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   My company is using deltastreamer to ingestion data from kafka to hdfs, the old hudi table type is COW. But the write latency of the COW table is much higher than MOR table. So we are going to migrate the COW tables to MOR.
   According to the [FAQ](https://hudi.apache.org/docs/faq/#how-to-convert-an-existing-cow-table-to-mor), we can change the existing COW table to MOR by just changing the `hoodity.table.type` property.
   But there is one issue for continuing deltastreamer of MOR table on the existing path, the checkpoint from old COW table will lost, so there may be dataloss in such cases.
   
   I find the cause of the checkpoint loss.
   In the refreshTimeline method, when table is MOR only get checkpoint from deltacommits, that's why the checkpoint loss when migrating COW to MOR https://github.com/apache/hudi/blob/ce21873f332f1728ad46aeb066777e49456ef522/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L325
   
   Simply put, I want the deltastreamer to support migrate hudi table directly by a parameter
   
   
   **Expected behavior**
   
   deltastreamer offer a parameter `--migration-type` to support migrate existing COW table to MOR, 
   the value of the param `--migration-type` can be: 
   1. NONE (default no migration)
   2. COW_TO_MOR
   3. MOR_TO_COW (TODO)
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1477220410

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15822",
       "triggerID" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 11097d2037389facd18754addb493621aa3f59a6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816) 
   * 05723c875468112b99e882bf2930177efcb3c6ce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15822) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1476694826

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 11097d2037389facd18754addb493621aa3f59a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waitingF closed pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "waitingF (via GitHub)" <gi...@apache.org>.
waitingF closed pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR
URL: https://github.com/apache/hudi/pull/8247


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1477319821

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15822",
       "triggerID" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05723c875468112b99e882bf2930177efcb3c6ce Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15822) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1476708144

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 11097d2037389facd18754addb493621aa3f59a6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1477214257

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "05723c875468112b99e882bf2930177efcb3c6ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 11097d2037389facd18754addb493621aa3f59a6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816) 
   * 05723c875468112b99e882bf2930177efcb3c6ce UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8247: [SUPPORT] deltastreamer support migrate COW table to MOR

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8247:
URL: https://github.com/apache/hudi/pull/8247#issuecomment-1477097204

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "11097d2037389facd18754addb493621aa3f59a6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816",
       "triggerID" : "11097d2037389facd18754addb493621aa3f59a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 11097d2037389facd18754addb493621aa3f59a6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15816) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org