You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/14 03:49:52 UTC

[GitHub] [hudi] scxwhite opened a new pull request #5030: [HUDI-3617] MOR compact improve

scxwhite opened a new pull request #5030:
URL: https://github.com/apache/hudi/pull/5030


   
   
   ## What is the purpose of the pull request
   
   In most business scenarios, the latest data is in the latest delta log file, so we sort it from large to small according to the instance time, which can largely avoid rewriting the data in the compact process, and then optimize the compact time
   
   ## Brief change log
   
   
     - change compact plan in HoodieCompactor
     - change record merge in HoodieMergedLogRecordScanner
     - add three test in TestHoodieCompactor
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1068122560


   @alexeykudinkin : can you review this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066312776


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] scxwhite commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
scxwhite commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r827820051



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> !fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       > Kind of got your idea, then i think we should always use the reverse order and the comparing sequence in merge reader should also be reversed to keep the process time semantics.
   
   yes. did you say it's here? https://github.com/apache/hudi/pull/5030/files#diff-c2f73f1ce4c0687cffa73e96b82514aca3a930ec1a8bc0c2efd73d7cf869c883R150
   If so, the above has been modified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066398223


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908",
       "triggerID" : "1066397758",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066398223


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908",
       "triggerID" : "1066397758",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] scxwhite commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
scxwhite commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066397758


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066435964


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908",
       "triggerID" : "1066397758",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6908) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066313536


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r827577363



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> !fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       Kind of got your idea, then i think we should always use the reverse order and the comparing sequence in merge reader should also be reversed to keep the process time semantics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] scxwhite commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
scxwhite commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r827820051



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> !fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       > Kind of got your idea, then i think we should always use the reverse order and the comparing sequence in merge reader should also be reversed to keep the process time semantics.
   
   yes. did you say it's here?(  https://github.com/apache/hudi/pull/5030/files#diff-c2f73f1ce4c0687cffa73e96b82514aca3a930ec1a8bc0c2efd73d7cf869c883R150)
   If so, the above has been modified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066313536


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066383319


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r825624914



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> !fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       What do you mean by `avoid rewriting the data in the compact process` here ? Shouldn't the reader have the same merged content no matter what the read sequence is for log files ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066383319


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6905) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] scxwhite commented on a change in pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
scxwhite commented on a change in pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r825649696



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -280,8 +281,11 @@ HoodieCompactionPlan generateCompactionPlan(
         .getLatestFileSlices(partitionPath)
         .filter(slice -> !fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
         .map(s -> {
+          // In most business scenarios, the latest data is in the latest delta log file, so we sort it from large
+          // to small according to the instance time, which can largely avoid rewriting the data in the
+          // compact process, and then optimize the compact time
           List<HoodieLogFile> logFiles =

Review comment:
       > What do you mean by `avoid rewriting the data in the compact process` here ? Shouldn't the reader have the same merged content no matter what the read sequence is for log files ?
   
   @danny0405  Thanks for your quick reply.
   
   What I am talking about here is that in the delta log files reading stage, we can put the latest data into the ExternalSpillableMap of HoodieMergedLogRecordScanner#records in advance。
   Briefly explain:
   
   If we have a record in basefile, recordKey = 1, and preCombineField = 1, and some other fields.
   **first commit: recordKey = 1,preCombineField=2, and some other update fields. Generate delta log1.**
   **second commit: recordKey = 1,preCombineField=3, and some other update fields.Generate delta log2.**
   **third commit: recordKey = 1,preCombineField=4, and some other update fields.Generate delta log3.**
   Three delta log files will be generated after three commits.
   
    When the compact operation is triggered,if the delta log files are sorted according to natural order. When reading the delta log1 file, we will first put the record with recordKey = 1 and preCombineField = 2 into the map(HoodieMergedLogRecordScanner#records). When reading the delta log2 file,  the record(recordKey = 1, preCombineField=3) will overwrite the record of delta log1 (recordKey =1, preCombineField=2),and so on.
   
   However,If the delta log files are sorted in reverse order by instancetime.We will first put the data of delta log3 (recordKey=1, preCombineField=4) into the map of ExternalSpillableMap. Even if we read delta log2 and delta log1 next, they will not be selected by HoodieRecordPayload#preCombine because their preCombineField is smaller and the data is older.  
   This is what I said "avoid rewriting the data in the compact process".
   In addition, reducing the amount of rewriting data will save a lot of time when the ExternalSpillableMap overflows its memory size and spills to disk.
   
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5030: [HUDI-3617] MOR compact improve

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5030:
URL: https://github.com/apache/hudi/pull/5030#issuecomment-1066312776


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "662b700342281bad16175f2135b5c5b75d56aea4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 662b700342281bad16175f2135b5c5b75d56aea4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org