You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/13 08:04:01 UTC

[GitHub] [hudi] codope opened a new pull request, #6662: [HUDI-4832] Fix drop partition meta sync

codope opened a new pull request, #6662:
URL: https://github.com/apache/hudi/pull/6662

   ### Change Logs
   
   Fixes issue #6578 
   
   While considering whether to drop partition, meta sync should only drop partitions that were part of latest drop_partition operation (replacecommit) metadata. 
   
   Also, ensure archive timeline since last synced time is considered while fetching partitions written since last synced time.
   
   ### Impact
   
   Medium. No public API change. But, the issue could lead to wrong results.
   
   Added a test to verify the change.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1245343465

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8d100f5793803b053673f0730ea34b7c75e1d41c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972042716


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
+  /**
+   * Get the set of dropped partitions based on the latest commit metadata.
+   * Returns empty set if the latest commit was not due to DELETE_PARTITION operation.
+   */
+  public Set<String> getDroppedPartitions() {
     try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+      Option<HoodieCommitMetadata> hoodieCommitMetadata = getLatestCommitMetadata(metaClient);
 
       if (hoodieCommitMetadata.isPresent()
           && WriteOperationType.DELETE_PARTITION.equals(hoodieCommitMetadata.get().getOperationType())) {
-        return true;
+        Map<String, List<String>> partitionToReplaceFileIds =
+            ((HoodieReplaceCommitMetadata) hoodieCommitMetadata.get()).getPartitionToReplaceFileIds();

Review Comment:
   I see.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1248491467

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 25f03c88cb4d0d18529746053b9575f0574dcfc0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972587570


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -158,4 +171,23 @@ public List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions,
     }
     return events;
   }
+
+  /**
+   * Get Last commit's Metadata.
+   */
+  private static Option<HoodieCommitMetadata> getLatestCommitMetadata(HoodieTableMetaClient metaClient) {

Review Comment:
   i think a more relevant logic we need for partition sync is: getCommitMetadataSinceLastSync(). it should return time-ordered commit metadata for extracting (partition, written|dropped) to be further produced partition events for sync



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1249958377

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a6b676c8a983aad9ef485d73ec1dc7dd462a055a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1250828889

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05774542e7e99184781292e69747f7b20de24e22",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11488",
       "triggerID" : "05774542e7e99184781292e69747f7b20de24e22",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05774542e7e99184781292e69747f7b20de24e22 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11488) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1250621395

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05774542e7e99184781292e69747f7b20de24e22",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "05774542e7e99184781292e69747f7b20de24e22",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a6b676c8a983aad9ef485d73ec1dc7dd462a055a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424) 
   * 05774542e7e99184781292e69747f7b20de24e22 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1245083845

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8d100f5793803b053673f0730ea34b7c75e1d41c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1249168943

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 25f03c88cb4d0d18529746053b9575f0574dcfc0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385) 
   * a6b676c8a983aad9ef485d73ec1dc7dd462a055a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1249388775

   cc @fengjian428 does this solve your problem with [HUDI-4538](https://issues.apache.org/jira/browse/HUDI-4538)? PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972808396


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
+  /**
+   * Get the set of dropped partitions based on the latest commit metadata.
+   * Returns empty set if the latest commit was not due to DELETE_PARTITION operation.
+   */
+  public Set<String> getDroppedPartitions() {
     try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+      Option<HoodieCommitMetadata> hoodieCommitMetadata = getLatestCommitMetadata(metaClient);

Review Comment:
   that's right.. thanks for bringing it up. i've fixed this now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1248176123

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8d100f5793803b053673f0730ea34b7c75e1d41c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322) 
   * 25f03c88cb4d0d18529746053b9575f0574dcfc0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1250193874

   > cc @fengjian428 does this solve your problem with [HUDI-4538](https://issues.apache.org/jira/browse/HUDI-4538)? PTAL
   
   Yeah, I think so.  since it has merged ArchivedTimeline and Active timeline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1249164088

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 25f03c88cb4d0d18529746053b9575f0574dcfc0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385) 
   * a6b676c8a983aad9ef485d73ec1dc7dd462a055a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r971690126


##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java:
##########
@@ -88,8 +88,8 @@
 public class TestHiveSyncTool {
 
   private static final List<Object> SYNC_MODES = Arrays.asList(
-      "hiveql",
-      "hms",
+      /*"hiveql",
+      "hms",*/

Review Comment:
   Any reason for commenting these?



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
+  /**
+   * Get the set of dropped partitions based on the latest commit metadata.
+   * Returns empty set if the latest commit was not due to DELETE_PARTITION operation.
+   */
+  public Set<String> getDroppedPartitions() {
     try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+      Option<HoodieCommitMetadata> hoodieCommitMetadata = getLatestCommitMetadata(metaClient);
 
       if (hoodieCommitMetadata.isPresent()
           && WriteOperationType.DELETE_PARTITION.equals(hoodieCommitMetadata.get().getOperationType())) {
-        return true;
+        Map<String, List<String>> partitionToReplaceFileIds =
+            ((HoodieReplaceCommitMetadata) hoodieCommitMetadata.get()).getPartitionToReplaceFileIds();

Review Comment:
   Why are we doing getpartitionToReplaceFileIds() here? `DELETE_PARTITION` operation can occur in a normal commit other than replacecommit action or am I missing anything here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r971980886


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
+  /**
+   * Get the set of dropped partitions based on the latest commit metadata.
+   * Returns empty set if the latest commit was not due to DELETE_PARTITION operation.
+   */
+  public Set<String> getDroppedPartitions() {
     try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+      Option<HoodieCommitMetadata> hoodieCommitMetadata = getLatestCommitMetadata(metaClient);
 
       if (hoodieCommitMetadata.isPresent()
           && WriteOperationType.DELETE_PARTITION.equals(hoodieCommitMetadata.get().getOperationType())) {
-        return true;
+        Map<String, List<String>> partitionToReplaceFileIds =
+            ((HoodieReplaceCommitMetadata) hoodieCommitMetadata.get()).getPartitionToReplaceFileIds();

Review Comment:
   DELETE_PARTITION operation adds a replacecommit action in the timeline. The partitions which get deleted are contained in `partitionToReplaceFileIds `.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972062851


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
+  /**
+   * Get the set of dropped partitions based on the latest commit metadata.
+   * Returns empty set if the latest commit was not due to DELETE_PARTITION operation.
+   */
+  public Set<String> getDroppedPartitions() {
     try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+      Option<HoodieCommitMetadata> hoodieCommitMetadata = getLatestCommitMetadata(metaClient);

Review Comment:
   This is still a problem I believe. Consider the scenario where 3 commits happen (without syncing to metastore) in order with action given below - 
   1. upsert
   2. drop_partition
   3. drop_partition
   
   We will miss the partitions dropped in commit 2 if we only see the latest commit metadata here. I guess we should check all the commit metadata since the last sync time with metastore and then get the dropped partitions. 
   
   Also it will be good to add a test case simulating this scenario so this remains intact in future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1248166767

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8d100f5793803b053673f0730ea34b7c75e1d41c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322) 
   * 25f03c88cb4d0d18529746053b9575f0574dcfc0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #6662:
URL: https://github.com/apache/hudi/pull/6662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972807725


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -158,4 +171,23 @@ public List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions,
     }
     return events;
   }
+
+  /**
+   * Get Last commit's Metadata.
+   */
+  private static Option<HoodieCommitMetadata> getLatestCommitMetadata(HoodieTableMetaClient metaClient) {

Review Comment:
   Makes sense. Added a util method in TImelineUtils to get partitions dropped since give time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r973800037


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -158,4 +163,23 @@ public List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions,
     }
     return events;
   }
+
+  /**
+   * Get Last commit's Metadata.
+   */
+  private static Option<HoodieCommitMetadata> getLatestCommitMetadata(HoodieTableMetaClient metaClient) {

Review Comment:
   this method still needed?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java:
##########
@@ -57,6 +58,29 @@ public static List<String> getPartitionsWritten(HoodieTimeline timeline) {
     return getAffectedPartitions(timelineToSync);
   }
 
+  /**
+   * Returns partitions that have been deleted or marked for deletion in the given timeline.
+   * Does not include internal operations such as clean in the timeline.
+   */
+  public static List<String> getPartitionsDropped(HoodieTimeline timeline) {

Review Comment:
   /nit name alignment: getPartitionsWritten getPartitionsDropped getAffectedPartitions -> getXXXPartitions



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +86,17 @@ public boolean isBootstrap() {
     return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
   }
 
-  public boolean isDropPartition() {
-    try {
-      Option<HoodieCommitMetadata> hoodieCommitMetadata = HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
-
-      if (hoodieCommitMetadata.isPresent()
-          && WriteOperationType.DELETE_PARTITION.equals(hoodieCommitMetadata.get().getOperationType())) {
-        return true;
-      }
-    } catch (Exception e) {
-      throw new HoodieSyncException("Failed to get commit metadata", e);
-    }
-    return false;
+  /**
+   * Get the set of dropped partitions since the last synced commit.
+   * If last sync time is not known then consider only active timeline.
+   * Going through archive timeline is a costly operation, and it should be avoided unless some start time is given.
+   */
+  public Set<String> getDroppedPartitions(Option<String> lastCommitTimeSynced) {

Review Comment:
   name alignment: getDroppedPartitions -> getDroppedPartitionsSince(...) getPartitionsWrittenToSince -> getWrittenPartitionsSince(...)
   
   also since it's not perf sensitive, i'd prefer to align on return type List<String> as well, for API consistency



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1245090323

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8d100f5793803b053673f0730ea34b7c75e1d41c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r971978788


##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java:
##########
@@ -88,8 +88,8 @@
 public class TestHiveSyncTool {
 
   private static final List<Object> SYNC_MODES = Arrays.asList(
-      "hiveql",
-      "hms",
+      /*"hiveql",
+      "hms",*/

Review Comment:
   Good catch. This was just to reduce test time locally. I have removed the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1250624480

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11322",
       "triggerID" : "8d100f5793803b053673f0730ea34b7c75e1d41c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11385",
       "triggerID" : "25f03c88cb4d0d18529746053b9575f0574dcfc0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424",
       "triggerID" : "a6b676c8a983aad9ef485d73ec1dc7dd462a055a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05774542e7e99184781292e69747f7b20de24e22",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11488",
       "triggerID" : "05774542e7e99184781292e69747f7b20de24e22",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a6b676c8a983aad9ef485d73ec1dc7dd462a055a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11424) 
   * 05774542e7e99184781292e69747f7b20de24e22 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11488) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6662: [HUDI-4832] Fix drop partition meta sync

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6662:
URL: https://github.com/apache/hudi/pull/6662#issuecomment-1249370360

   cc @liujinhui1994 does this solve your problem with [HUDI-4451](https://issues.apache.org/jira/browse/HUDI-4451) ? PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org