You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/22 06:50:15 UTC

[GitHub] [hudi] danny0405 opened a new pull request, #6740: [HUDI-4897] Refactor the merge handle in CDC mode

danny0405 opened a new pull request, #6740:
URL: https://github.com/apache/hudi/pull/6740

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1254618334

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e270b9cf24cefb1fef8bfb2e0c91b859242b2a13 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1254622940

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e270b9cf24cefb1fef8bfb2e0c91b859242b2a13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1254850707

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e270b9cf24cefb1fef8bfb2e0c91b859242b2a13 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1256029916

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11604",
       "triggerID" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ebb7fa695913266b919f7d680a60bab5dc8d64d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11604) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #6740:
URL: https://github.com/apache/hudi/pull/6740#discussion_r980606386


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -210,18 +203,6 @@ private void init(String fileId, String partitionPath, HoodieBaseFile baseFileTo
       // Create the writer for writing the new version file
       fileWriter = createNewFileWriter(instantTime, newFilePath, hoodieTable, config,
         writeSchemaWithMetaFields, taskContextSupplier);
-
-      // init the cdc logger
-      this.cdcEnabled = config.getBooleanOrDefault(HoodieTableConfig.CDC_ENABLED);

Review Comment:
   @danny0405 appreciate your viewpoint regarding separation of concerns in regard to CDC logging, but i don't think this change make sense to me: 
   
    - Instead of having CDC logging centralized in one place we now have it spread across every class that is extending `HoodieMergeHandle` class (`HoodieMergeHandleWithChangelog`, `HoodieSortedMergeHandleWithChangelog`). As i called out on the original PR this is not scalable approach -- we can't introduce sub-classes for every feature we introduce and we need to balance this consideration against other factors such as including, for ex, how many features we have to support (for N features if we do subclassing we will have to do up ton 2^N subclasses). Not long ago [i had to inline sub-classes for WriteHandle](https://github.com/apache/hudi/pull/5470/files#diff-ba11482e3356cce0c6aeed5ba4106e3341b83d347e1c1876caf7c0f13fab52c3) impls for exactly that reason -- w/o it with this new feature we would have to essentially double the number of classes.
   
    - Previously, CDC logic was implemented w/in a single class, this impl now introduces 5 new classes to achieve the same thing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1255777800

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e270b9cf24cefb1fef8bfb2e0c91b859242b2a13 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574) 
   * ebb7fa695913266b919f7d680a60bab5dc8d64d0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
danny0405 merged PR #6740:
URL: https://github.com/apache/hudi/pull/6740


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1255353287

   Follow up for https://github.com/apache/hudi/pull/6697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6740:
URL: https://github.com/apache/hudi/pull/6740#issuecomment-1255780266

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574",
       "triggerID" : "e270b9cf24cefb1fef8bfb2e0c91b859242b2a13",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11604",
       "triggerID" : "ebb7fa695913266b919f7d680a60bab5dc8d64d0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e270b9cf24cefb1fef8bfb2e0c91b859242b2a13 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11574) 
   * ebb7fa695913266b919f7d680a60bab5dc8d64d0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11604) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6740: [HUDI-4897] Refactor the merge handle in CDC mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6740:
URL: https://github.com/apache/hudi/pull/6740#discussion_r980626021


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java:
##########
@@ -210,18 +203,6 @@ private void init(String fileId, String partitionPath, HoodieBaseFile baseFileTo
       // Create the writer for writing the new version file
       fileWriter = createNewFileWriter(instantTime, newFilePath, hoodieTable, config,
         writeSchemaWithMetaFields, taskContextSupplier);
-
-      // init the cdc logger
-      this.cdcEnabled = config.getBooleanOrDefault(HoodieTableConfig.CDC_ENABLED);

Review Comment:
   Makes everything into one class is hard to extend for project with huge code bases. And i don't like your refactoring to the SparkRDDRelation and HoodieWriteClient based on the same reason. I have fixed so many bugs/degression after your refactoring in flink side, that makes me feel bad and hard to go on with this project.
   
   Do you think to do a per-record logic switching for non-cdc write path is reasonable ? Sorry i don't think so.
   
   So ignored.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org