You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/07 01:58:07 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

nsivabalan opened a new pull request #4530:
URL: https://github.com/apache/hudi/pull/4530


   ## What is the purpose of the pull request
   
   - We commit to metadata table followed by data table while committing any writes. At the end of metadata table commit, we also trigger compaction if conditions are met. There is a chance that the actual write eventually failed in data table on which case, compaction could have included the uncommitted data. But once compacted, it may never be ignored while reading from metadata table. So, this patch fixes the bug. Metadata table compaction is triggered before applying the commit to metadata table to circumvent this issue. 
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
manojpec commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r781400698



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());

Review comment:
       Right, I was actually asking the compaction time to be C10 and not C11. I misread line 689. Look good then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan merged pull request #4530:
URL: https://github.com/apache/hudi/pull/4530


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007078837


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007385719


   @manojpec : Can you review the patch too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r780621073



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());
 
     if (!pendingInstants.isEmpty()) {
       LOG.info(String.format("Cannot compact metadata table as there are %d inflight instants before latest deltacommit %s: %s",

Review comment:
       before latest instantTime




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007385719


   @manojpec : Can you review the patch too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007099737


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r781161131



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());

Review comment:
       let me try to explain. 
   lets say we have 10 commits, C1, C2 -> C10. 
   Prior to this patch, we will compact immediately after C10 and so compaction commit will be C10 + "001".
   
   With this patch, we will be compacting just before C11 starts getting applied to MDT. 
   And so, I am basing the compaction commit of latest delta commit time which is C10 and not instant time which is C11. 
   And so, its C10 + "001". but if I go with instantTime, then we might change the behavior. In fact, we can't do that, since compaction time will be greater than the delta commit which will be eventually created when we apply C11 to MDT. 
   
   Let me know if this makes sense. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007135331


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   * f67f3a7d1a3de2bd170fed67031a37ad1da5115a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r781161131



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());

Review comment:
       let me try to explain. 
   lets say we have 10 commits, C1, C2 -> C10. 
   Prior to this patch, we will compact immediately after C10 and so compaction commit will be C10 + "001".
   
   With this patch, we will be compacting just before C11 starts getting applied to MDT. 
   And so, I am basing the compaction commit of latest delta commit time which is C10 and not instant time which is C11. 
   And so, its C10 + "001". but if I go with instantTime, then we might change the behavior. In fact, we can't do that, since compaction time will be less than delta commit which will be eventually created when we apply C11 to MDT. 
   
   Let me know if this makes sense. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007099737


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007080163


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007135331


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   * f67f3a7d1a3de2bd170fed67031a37ad1da5115a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007136180


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962",
       "triggerID" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   * f67f3a7d1a3de2bd170fed67031a37ad1da5115a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007080163


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007166010


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962",
       "triggerID" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f67f3a7d1a3de2bd170fed67031a37ad1da5115a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
manojpec commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r780847869



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());

Review comment:
       `compactionInstantTime` at line 703 has to be based off `instantTime` and not `latestDeltaCommitTime`. Latest delta commit time is not part of the compaction yet. Otherwise we are changing the meaning of the current compaction timeline with this change. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007136180


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962",
       "triggerID" : "f67f3a7d1a3de2bd170fed67031a37ad1da5115a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4956) 
   * f67f3a7d1a3de2bd170fed67031a37ad1da5115a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#issuecomment-1007078837


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "33af796c394961a4d9b16dcba8950e68ee018ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 33af796c394961a4d9b16dcba8950e68ee018ea5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r780621073



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());
 
     if (!pendingInstants.isEmpty()) {
       LOG.info(String.format("Cannot compact metadata table as there are %d inflight instants before latest deltacommit %s: %s",

Review comment:
       before latest instantTime




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4530: [HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4530:
URL: https://github.com/apache/hudi/pull/4530#discussion_r780678908



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -689,7 +689,7 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String
     String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants().lastInstant()
         .get().getTimestamp();
     List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        .findInstantsBefore(latestDeltacommitTime).getInstants().collect(Collectors.toList());
+        .findInstantsBefore(instantTime).getInstants().collect(Collectors.toList());
 
     if (!pendingInstants.isEmpty()) {
       LOG.info(String.format("Cannot compact metadata table as there are %d inflight instants before latest deltacommit %s: %s",

Review comment:
       this was intentionally switched to instanttime. Prior to this patch, latestDeltacommitTime will refer to the current commit being applied to MDT, bcoz, commit would have been completed in MDT by the time we reach this part of the code and hence it works. But with this patch, latestDeltacommitTime will refer to last commited delta commit and hence I am using `instantTime` which is the current commit being applied to MDT. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org