You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/04/05 22:30:26 UTC

[GitHub] [hudi] yihua opened a new pull request, #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

yihua opened a new pull request, #8388:
URL: https://github.com/apache/hudi/pull/8388

   ### Change Logs
   
   This PR adds the fallback mechanism in Hive and Glue catalog sync so that if the last commit time synced falls behind to be before the start of the active timeline of Hudi table, the sync gets all partition paths on storage and resolves the difference compared to what's in the metastore, instead of reading archived timeline which can be expensive in I/O.  The PR also enhances the tests to cover the new logic.
   
   Note that, the last commit time synced CAN fall behind, especially for Glue catalog, where `hoodie.datasource.meta_sync.condition.sync` is recommended to be set to `true` so that the last commit time synced is only updated upon partition changes, to limit the number of versions of data in Glue catalog.
   
   ### Impact
   
   Avoids loading archived timeline during Hive and Glue Sync.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   No documentation update needed.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1498269327

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537638295

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16927",
       "triggerID" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3f12c145c40172e1fb5139e9bc5a392bd989c859 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924) 
   * b043a8845b6a4d12148b9f45d92638cbe4c1cba7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16927) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1498394308

   cc @umehrot2 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537302254

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537576877

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3f12c145c40172e1fb5139e9bc5a392bd989c859 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1163442102


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat,
       lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
     }
     LOG.info("Last commit time synced was found to be " + lastCommitTimeSynced.orElse("null"));
-    List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-    LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());
 
-    // Sync the partitions if needed
-    // find dropped partitions, if any, in the latest commit
-    Set<String> droppedPartitions = syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-    boolean partitionsChanged = syncPartitions(tableName, writtenPartitionsSince, droppedPartitions);
+    boolean partitionsChanged;
+    if (!lastCommitTimeSynced.isPresent()
+        || syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get())) {
+      // If the last commit time synced is before the start of the active timeline,
+      // the Hive sync falls back to list all partitions on storage, instead of
+      // reading active and archived timelines for written partitions.
+      LOG.info("Sync all partitions given the last commit time synced is empty or "
+          + "before the start of the active timeline. Listing all partitions in "
+          + config.getString(META_SYNC_BASE_PATH)
+          + ", file system: " + config.getHadoopFileSystem());
+      partitionsChanged = syncAllPartitions(tableName);
+    } else {
+      List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+      LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());

Review Comment:
   LOG.info("Storage partitions scan complete.  Found {}.", writtenPartitionsSince.size());



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1181526772


##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java:
##########
@@ -68,6 +68,7 @@
 import org.apache.parquet.hadoop.metadata.CompressionCodecName;
 import org.apache.zookeeper.server.ZooKeeperServer;
 import org.junit.platform.commons.JUnitException;
+import org.mortbay.log.Log;

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537231983

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777) 
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1186910826


##########
azure-pipelines.yml:
##########
@@ -200,7 +203,7 @@ stages:
             inputs:
               mavenPomFile: 'pom.xml'
               goals: 'test'
-              options: $(MVN_OPTS_TEST) -Punit-tests -pl $(JOB4_UT_MODULES)
+              options: $(MVN_OPTS_TEST) -Punit-tests -pl $(JOB4_UT_MODULES) -DfailIfNoTests=false -DwildcardSuites="abc" -Dtest=TestHoodieDeltaStreamer#testForceEmptyMetaSync

Review Comment:
   Yes, this is for debugging purposes only.  Reverted the changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1187059837


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -112,44 +113,101 @@ public MessageType getStorageSchema(boolean includeMetadataField) {
     }
   }
 
+  /**
+   * Gets all relative partitions paths in the Hudi table on storage.
+   *
+   * @return All relative partitions paths.
+   */
+  public List<String> getAllPartitionPathsOnStorage() {
+    HoodieLocalEngineContext engineContext = new HoodieLocalEngineContext(metaClient.getHadoopConf());
+    return FSUtils.getAllPartitionPaths(engineContext,
+        config.getString(META_SYNC_BASE_PATH),
+        config.getBoolean(META_SYNC_USE_FILE_LISTING_FROM_METADATA),
+        config.getBoolean(META_SYNC_ASSUME_DATE_PARTITION));
+  }
+
   public List<String> getWrittenPartitionsSince(Option<String> lastCommitTimeSynced) {
     if (!lastCommitTimeSynced.isPresent()) {
       LOG.info("Last commit time synced is not known, listing all partitions in "
           + config.getString(META_SYNC_BASE_PATH)
           + ",FS :" + config.getHadoopFileSystem());
-      HoodieLocalEngineContext engineContext = new HoodieLocalEngineContext(metaClient.getHadoopConf());
-      return FSUtils.getAllPartitionPaths(engineContext,
-          config.getString(META_SYNC_BASE_PATH),
-          config.getBoolean(META_SYNC_USE_FILE_LISTING_FROM_METADATA),
-          config.getBoolean(META_SYNC_ASSUME_DATE_PARTITION));
+      return getAllPartitionPathsOnStorage();
     } else {
       LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + ", Getting commits since then");
       return TimelineUtils.getWrittenPartitions(
           TimelineUtils.getCommitsTimelineAfter(metaClient, lastCommitTimeSynced.get()));
     }
   }
 
+  /**
+   * Gets the partition events for changed partitions.
+   * <p>
+   * This compares the list of all partitions of a table stored in the metastore and
+   * on the storage:
+   * (1) Partitions exist in the metastore, but NOT the storage: drops them in the metastore;
+   * (2) Partitions exist on the storage, but NOT the metastore: adds them to the metastore;
+   * (3) Partitions exist in both, but the partition path is different: update them in the metastore.
+   *
+   * @param allPartitionsInMetastore All partitions of a table stored in the metastore.
+   * @param allPartitionsOnStorage   All partitions of a table stored on the storage.
+   * @return partition events for changed partitions.
+   */
+  public List<PartitionEvent> getPartitionEvents(List<Partition> allPartitionsInMetastore,
+                                                 List<String> allPartitionsOnStorage) {
+    Map<String, String> paths = getPartitionValuesToPathMapping(allPartitionsInMetastore);
+    Set<String> partitionsToDrop = new HashSet<>(paths.keySet());
+
+    List<PartitionEvent> events = new ArrayList<>();
+    for (String storagePartition : allPartitionsOnStorage) {
+      Path storagePartitionPath = FSUtils.getPartitionPath(config.getString(META_SYNC_BASE_PATH), storagePartition);
+      String fullStoragePartitionPath = Path.getPathWithoutSchemeAndAuthority(storagePartitionPath).toUri().getPath();
+      // Check if the partition values or if hdfs path is the same
+      List<String> storagePartitionValues = partitionValueExtractor.extractPartitionValuesInPath(storagePartition);
+
+      if (!storagePartitionValues.isEmpty()) {
+        String storageValue = String.join(", ", storagePartitionValues);
+        // Remove partitions that exist on storage from the `partitionsToDrop` set,
+        // so the remaining partitions that exist in the metastore should be dropped
+        partitionsToDrop.remove(storageValue);
+        if (!paths.containsKey(storageValue)) {
+          events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
+        } else if (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
+          events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
+        }
+      }
+    }
+
+    partitionsToDrop.forEach(storageValue -> {
+      String storagePath = paths.get(storageValue);
+      try {
+        String relativePath = FSUtils.getRelativePartitionPath(
+            metaClient.getBasePathV2(), new CachingPath(storagePath));
+        events.add(PartitionEvent.newPartitionDropEvent(relativePath));
+      } catch (IllegalArgumentException e) {
+        LOG.error("Cannot parse the path stored in the metastore, ignoring it for "
+            + "generating DROP partition event: \"" + storagePath + "\".", e);

Review Comment:
   In normal cases, this should not happen.  If a user makes a mistake, HMS may have a table in an inconsistent state where some partitions do not have the partition paths belonging to the same base path.  Before this patch, in such a case, the meta sync can succeed.  So to follow the same behavior as before, we allow the meta sync to succeed, and print error messages for such partitions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1529675089

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149) 
   * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537726867

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16927",
       "triggerID" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b043a8845b6a4d12148b9f45d92638cbe4c1cba7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16927) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537252078

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537612605

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b043a8845b6a4d12148b9f45d92638cbe4c1cba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3f12c145c40172e1fb5139e9bc5a392bd989c859 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924) 
   * b043a8845b6a4d12148b9f45d92638cbe4c1cba7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1159152494


##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java:
##########
@@ -68,6 +68,7 @@
 import org.apache.parquet.hadoop.metadata.CompressionCodecName;
 import org.apache.zookeeper.server.ZooKeeperServer;
 import org.junit.platform.commons.JUnitException;
+import org.mortbay.log.Log;

Review Comment:
   wrong import?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1529618941

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149) 
   * b647ef79567b2c25f567dec407f30139065d2fe3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1181526980


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat,
       lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
     }
     LOG.info("Last commit time synced was found to be " + lastCommitTimeSynced.orElse("null"));
-    List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-    LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());
 
-    // Sync the partitions if needed
-    // find dropped partitions, if any, in the latest commit
-    Set<String> droppedPartitions = syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-    boolean partitionsChanged = syncPartitions(tableName, writtenPartitionsSince, droppedPartitions);
+    boolean partitionsChanged;
+    if (!lastCommitTimeSynced.isPresent()
+        || syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get())) {
+      // If the last commit time synced is before the start of the active timeline,
+      // the Hive sync falls back to list all partitions on storage, instead of
+      // reading active and archived timelines for written partitions.
+      LOG.info("Sync all partitions given the last commit time synced is empty or "
+          + "before the start of the active timeline. Listing all partitions in "
+          + config.getString(META_SYNC_BASE_PATH)
+          + ", file system: " + config.getHadoopFileSystem());
+      partitionsChanged = syncAllPartitions(tableName);
+    } else {
+      List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+      LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());

Review Comment:
   I am going to change only for the logs introduced in this PR. I think we should take the cleanup in a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537230992

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777) 
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537542366

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911) 
   * 3f12c145c40172e1fb5139e9bc5a392bd989c859 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope merged PR #8388:
URL: https://github.com/apache/hudi/pull/8388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1498384417

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1529721614

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1498263847

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537311424

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909) 
   * 18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537323167

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909) 
   * 18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537543698

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924",
       "triggerID" : "3f12c145c40172e1fb5139e9bc5a392bd989c859",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911) 
   * 3f12c145c40172e1fb5139e9bc5a392bd989c859 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16924) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1163441854


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat,
       lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
     }
     LOG.info("Last commit time synced was found to be " + lastCommitTimeSynced.orElse("null"));
-    List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-    LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());
 
-    // Sync the partitions if needed
-    // find dropped partitions, if any, in the latest commit
-    Set<String> droppedPartitions = syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-    boolean partitionsChanged = syncPartitions(tableName, writtenPartitionsSince, droppedPartitions);
+    boolean partitionsChanged;
+    if (!lastCommitTimeSynced.isPresent()
+        || syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get())) {
+      // If the last commit time synced is before the start of the active timeline,
+      // the Hive sync falls back to list all partitions on storage, instead of
+      // reading active and archived timelines for written partitions.
+      LOG.info("Sync all partitions given the last commit time synced is empty or "
+          + "before the start of the active timeline. Listing all partitions in "
+          + config.getString(META_SYNC_BASE_PATH)
+          + ", file system: " + config.getHadoopFileSystem());
+      partitionsChanged = syncAllPartitions(tableName);
+    } else {
+      List<String> writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+      LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size());

Review Comment:
   Our logging has changed to slf4j, can we use {}?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537353747

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16149",
       "triggerID" : "0b0c61e41e4f7446eb75803e3e0bff4e01afe2ad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16777",
       "triggerID" : "b647ef79567b2c25f567dec407f30139065d2fe3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16902",
       "triggerID" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b5d1633f7f621d17f14bab4044546568d2b90cd8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16909",
       "triggerID" : "1537295895",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911",
       "triggerID" : "18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18e3f3140ba960834c0a3fb4285c7eabb7bbb0d7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16911) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1186807328


##########
azure-pipelines.yml:
##########
@@ -200,7 +203,7 @@ stages:
             inputs:
               mavenPomFile: 'pom.xml'
               goals: 'test'
-              options: $(MVN_OPTS_TEST) -Punit-tests -pl $(JOB4_UT_MODULES)
+              options: $(MVN_OPTS_TEST) -Punit-tests -pl $(JOB4_UT_MODULES) -DfailIfNoTests=false -DwildcardSuites="abc" -Dtest=TestHoodieDeltaStreamer#testForceEmptyMetaSync

Review Comment:
   I'm assuming this is just for debugging purposes. Do we still need changes in this file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8388:
URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537295895

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1186990025


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -112,44 +113,101 @@ public MessageType getStorageSchema(boolean includeMetadataField) {
     }
   }
 
+  /**
+   * Gets all relative partitions paths in the Hudi table on storage.
+   *
+   * @return All relative partitions paths.
+   */
+  public List<String> getAllPartitionPathsOnStorage() {
+    HoodieLocalEngineContext engineContext = new HoodieLocalEngineContext(metaClient.getHadoopConf());
+    return FSUtils.getAllPartitionPaths(engineContext,
+        config.getString(META_SYNC_BASE_PATH),
+        config.getBoolean(META_SYNC_USE_FILE_LISTING_FROM_METADATA),
+        config.getBoolean(META_SYNC_ASSUME_DATE_PARTITION));
+  }
+
   public List<String> getWrittenPartitionsSince(Option<String> lastCommitTimeSynced) {
     if (!lastCommitTimeSynced.isPresent()) {
       LOG.info("Last commit time synced is not known, listing all partitions in "
           + config.getString(META_SYNC_BASE_PATH)
           + ",FS :" + config.getHadoopFileSystem());
-      HoodieLocalEngineContext engineContext = new HoodieLocalEngineContext(metaClient.getHadoopConf());
-      return FSUtils.getAllPartitionPaths(engineContext,
-          config.getString(META_SYNC_BASE_PATH),
-          config.getBoolean(META_SYNC_USE_FILE_LISTING_FROM_METADATA),
-          config.getBoolean(META_SYNC_ASSUME_DATE_PARTITION));
+      return getAllPartitionPathsOnStorage();
     } else {
       LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + ", Getting commits since then");
       return TimelineUtils.getWrittenPartitions(
           TimelineUtils.getCommitsTimelineAfter(metaClient, lastCommitTimeSynced.get()));
     }
   }
 
+  /**
+   * Gets the partition events for changed partitions.
+   * <p>
+   * This compares the list of all partitions of a table stored in the metastore and
+   * on the storage:
+   * (1) Partitions exist in the metastore, but NOT the storage: drops them in the metastore;
+   * (2) Partitions exist on the storage, but NOT the metastore: adds them to the metastore;
+   * (3) Partitions exist in both, but the partition path is different: update them in the metastore.
+   *
+   * @param allPartitionsInMetastore All partitions of a table stored in the metastore.
+   * @param allPartitionsOnStorage   All partitions of a table stored on the storage.
+   * @return partition events for changed partitions.
+   */
+  public List<PartitionEvent> getPartitionEvents(List<Partition> allPartitionsInMetastore,
+                                                 List<String> allPartitionsOnStorage) {
+    Map<String, String> paths = getPartitionValuesToPathMapping(allPartitionsInMetastore);
+    Set<String> partitionsToDrop = new HashSet<>(paths.keySet());
+
+    List<PartitionEvent> events = new ArrayList<>();
+    for (String storagePartition : allPartitionsOnStorage) {
+      Path storagePartitionPath = FSUtils.getPartitionPath(config.getString(META_SYNC_BASE_PATH), storagePartition);
+      String fullStoragePartitionPath = Path.getPathWithoutSchemeAndAuthority(storagePartitionPath).toUri().getPath();
+      // Check if the partition values or if hdfs path is the same
+      List<String> storagePartitionValues = partitionValueExtractor.extractPartitionValuesInPath(storagePartition);
+
+      if (!storagePartitionValues.isEmpty()) {
+        String storageValue = String.join(", ", storagePartitionValues);
+        // Remove partitions that exist on storage from the `partitionsToDrop` set,
+        // so the remaining partitions that exist in the metastore should be dropped
+        partitionsToDrop.remove(storageValue);
+        if (!paths.containsKey(storageValue)) {
+          events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
+        } else if (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
+          events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
+        }
+      }
+    }
+
+    partitionsToDrop.forEach(storageValue -> {
+      String storagePath = paths.get(storageValue);
+      try {
+        String relativePath = FSUtils.getRelativePartitionPath(
+            metaClient.getBasePathV2(), new CachingPath(storagePath));
+        events.add(PartitionEvent.newPartitionDropEvent(relativePath));
+      } catch (IllegalArgumentException e) {
+        LOG.error("Cannot parse the path stored in the metastore, ignoring it for "
+            + "generating DROP partition event: \"" + storagePath + "\".", e);

Review Comment:
   Makes sense to not drop the partition in this case. But, just curious, what can cause this scenario?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org