You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "hbgstc123 (via GitHub)" <gi...@apache.org> on 2023/03/19 08:04:42 UTC

[GitHub] [hudi] hbgstc123 opened a new pull request, #8232: [HUDI-5955] fix incremental clean not work caused by archive

hbgstc123 opened a new pull request, #8232:
URL: https://github.com/apache/hudi/pull/8232

   ### Change Logs
   
   Current incremental clean action may miss data files that should be cleaned, if the commit instants of those data files were archived.
   
   This pr make sure when incremental clean enabled, `HoodieTimelineArchiver` won't archive commit instants later than or equals to the `earliestCommitToRetain` of last complete clean instant, so that clean executor can find those file in active timeline.
   
   ### Impact
   
   no
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   no
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on PR #8232:
URL: https://github.com/apache/hudi/pull/8232#issuecomment-1495239401

   https://github.com/apache/hudi/pull/8373
   
   I submit a new pr that fallback to full clean if instant needed for incremental clean is archived.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 closed pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 closed pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive
URL: https://github.com/apache/hudi/pull/8232


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8232:
URL: https://github.com/apache/hudi/pull/8232#issuecomment-1475829546

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15798",
       "triggerID" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d76af2bbfb5a27067d5ac2907dfc48034b734fca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15798) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8232:
URL: https://github.com/apache/hudi/pull/8232#discussion_r1141598861


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -474,13 +477,36 @@ private Stream<HoodieInstant> getCommitInstantsToArchive() throws IOException {
               oldestInstantToRetainForClustering.map(instantToRetain ->
                       HoodieTimeline.compareTimestamps(s.getTimestamp(), LESSER_THAN, instantToRetain.getTimestamp()))
                   .orElse(true)
+          ).filter(s ->
+              latestCleanRetainedInstant.map(instant ->
+                      compareTimestamps(s.getTimestamp(), LESSER_THAN, instant))

Review Comment:
   I can get the error use case for incremental cleaning, but block the archiving of instants with cleaning is kind of risky, it is better we fix the incremental cleaning procedure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8232:
URL: https://github.com/apache/hudi/pull/8232#issuecomment-1475591003

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d76af2bbfb5a27067d5ac2907dfc48034b734fca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8232:
URL: https://github.com/apache/hudi/pull/8232#discussion_r1142970640


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -474,13 +477,36 @@ private Stream<HoodieInstant> getCommitInstantsToArchive() throws IOException {
               oldestInstantToRetainForClustering.map(instantToRetain ->
                       HoodieTimeline.compareTimestamps(s.getTimestamp(), LESSER_THAN, instantToRetain.getTimestamp()))
                   .orElse(true)
+          ).filter(s ->
+              latestCleanRetainedInstant.map(instant ->
+                      compareTimestamps(s.getTimestamp(), LESSER_THAN, instant))

Review Comment:
   archived timeline should also be good if the loading does not happen in high frequency.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8232:
URL: https://github.com/apache/hudi/pull/8232#issuecomment-1475614869

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15798",
       "triggerID" : "d76af2bbfb5a27067d5ac2907dfc48034b734fca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d76af2bbfb5a27067d5ac2907dfc48034b734fca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15798) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8232:
URL: https://github.com/apache/hudi/pull/8232#discussion_r1141622129


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -474,13 +477,36 @@ private Stream<HoodieInstant> getCommitInstantsToArchive() throws IOException {
               oldestInstantToRetainForClustering.map(instantToRetain ->
                       HoodieTimeline.compareTimestamps(s.getTimestamp(), LESSER_THAN, instantToRetain.getTimestamp()))
                   .orElse(true)
+          ).filter(s ->
+              latestCleanRetainedInstant.map(instant ->
+                      compareTimestamps(s.getTimestamp(), LESSER_THAN, instant))

Review Comment:
   we can fallback to full cleaning if there are archived retained commits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8232: [HUDI-5955] fix incremental clean not work caused by archive

Posted by "hbgstc123 (via GitHub)" <gi...@apache.org>.
hbgstc123 commented on code in PR #8232:
URL: https://github.com/apache/hudi/pull/8232#discussion_r1141845840


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -474,13 +477,36 @@ private Stream<HoodieInstant> getCommitInstantsToArchive() throws IOException {
               oldestInstantToRetainForClustering.map(instantToRetain ->
                       HoodieTimeline.compareTimestamps(s.getTimestamp(), LESSER_THAN, instantToRetain.getTimestamp()))
                   .orElse(true)
+          ).filter(s ->
+              latestCleanRetainedInstant.map(instant ->
+                      compareTimestamps(s.getTimestamp(), LESSER_THAN, instant))

Review Comment:
   or just use archived timeline to continue incremental clean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org