You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "weimingdiit (via GitHub)" <gi...@apache.org> on 2023/03/27 11:02:25 UTC

[GitHub] [hudi] weimingdiit opened a new pull request, #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

weimingdiit opened a new pull request, #8301:
URL: https://github.com/apache/hudi/pull/8301

   …en partitions are lost
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1568922646

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "684a048842d92c5282d400e0e042680644fa8453",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484",
       "triggerID" : "684a048842d92c5282d400e0e042680644fa8453",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947) 
   * 684a048842d92c5282d400e0e042680644fa8453 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   `
       // Get the last time we successfully synced partitions
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1539584774

   > @weimingdiit I've landed an improvement on the Hive sync: #8388. Could you rebase your PR on the latest master?
   
   ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1486244631

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1159369021


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   @danny0405  Hi, take a look again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1187759698


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HiveSyncProcedure.scala:
##########
@@ -41,7 +41,8 @@ class HiveSyncProcedure extends BaseProcedure with ProcedureBuilder
     ProcedureParameter.optional(5, "mode", DataTypes.StringType, ""),
     ProcedureParameter.optional(6, "partition_fields", DataTypes.StringType, ""),
     ProcedureParameter.optional(7, "partition_extractor_class", DataTypes.StringType, ""),
-    ProcedureParameter.optional(8, "strategy", DataTypes.StringType, "")
+    ProcedureParameter.optional(8, "strategy", DataTypes.StringType, ""),
+    ProcedureParameter.optional(9, "partition_fixmode", DataTypes.StringType, "")

Review Comment:
   Let's fix the naming here too.



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   If the `incremental` is the naming, default should be `true`, i.e., "lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced".



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -226,6 +231,8 @@ public static class HoodieSyncConfigParams {
     public Boolean isConditionalSync;
     @Parameter(names = {"--spark-version"}, description = "The spark version")
     public String sparkVersion;
+    @Parameter(names = {"--partition-fixmode"}, description = "Implement a full partition sync operation when partitions are lost.")

Review Comment:
   Same here for naming.



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HiveSyncProcedure.scala:
##########
@@ -66,6 +67,7 @@ class HiveSyncProcedure extends BaseProcedure with ProcedureBuilder
     val partitionFields = getArgValueOrDefault(args, PARAMETERS(6)).get.asInstanceOf[String]
     val partitionExtractorClass = getArgValueOrDefault(args, PARAMETERS(7)).get.asInstanceOf[String]
     val strategy = getArgValueOrDefault(args, PARAMETERS(8)).get.asInstanceOf[String]
+    val partitionFixMode = getArgValueOrDefault(args, PARAMETERS(9)).get.asInstanceOf[String]

Review Comment:
   variable naming too.



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   Agree that `hoodie.datasource.hive_sync.incremental` is a better naming.  Since this can also apply to Glue Catalog sync, can we name it with `meta.sync`, i.e., `hoodie.meta.sync.incremental`?  @xushiyan what is the naming convention?  I see different prefixes are used, e.g., `hoodie.meta.sync`, `hoodie.datasource.meta_sync`, `hoodie.meta_sync`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1484975160

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1569404992

   @yihua @danny0405 Thanks for your review and edit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   
   `
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1210709111


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   @weimingdiit Understood.  I think there's a communication gap.  I revised the code based on our discussion.  Could you check if it algins with what you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1486806361

   @xushiyan @danny0405 @yihua @nsivabalan  Hi, could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160411416


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   How about `hoodie.datasource.hive_sync.incremental` and by default is true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1191865864


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   @yihua @danny0405 
   Maybe I didn't describe it clearly. The purpose of this pr is to provide a tool parameter when the metadata of the synchronization partition is found to be lost, **and the function of this parameter is not an incremental synchronization partition, but a parameter switch to control whether to use Perform a synchronous alignment operation for all partitions.**
   
   Looking at the current code logic is to do incremental synchronization according to lastCommitTimeSynced. If set to true, according to the code, the syncAllPartitions method will be used every time to synchronize all partitions, which is unnecessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1159374903


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   `fixmode` is not a good name, did quite get what it really means.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8301:
URL: https://github.com/apache/hudi/pull/8301


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1486121696

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   `
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   `
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1159517683


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   Maybe call ‘partition_align’?  how about this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1151366182


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   `hoodie.datasource.hive_sync.partition_fixmode` -> `hoodie.datasource.hive_sync.incremental` ? And by default, this option val is true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   `
       // Get the last time we successfully synced partitions
   
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1569222629

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "684a048842d92c5282d400e0e042680644fa8453",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484",
       "triggerID" : "684a048842d92c5282d400e0e042680644fa8453",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17487",
       "triggerID" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d90cad89063e1e8248b6828cdc1521bfc7000eb0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17487) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1569006785

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "684a048842d92c5282d400e0e042680644fa8453",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484",
       "triggerID" : "684a048842d92c5282d400e0e042680644fa8453",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17487",
       "triggerID" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 684a048842d92c5282d400e0e042680644fa8453 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484) 
   * d90cad89063e1e8248b6828cdc1521bfc7000eb0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17487) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1486146494

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1537812570

   @XuQianJin-Stars  Hi, could you please take a review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1485084863

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1191865864


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   @yihua @danny0405 
   Maybe I didn't describe it clearly. The purpose of this pr is to provide a tool parameter to control whether a full partition synchronization alignment operation is required when the metadata of the synchronization partition is found to be lost.
   
   Looking at the current code logic is to do incremental synchronization according to lastCommitTimeSynced. If set to true, according to the code, the syncAllPartitions method will be used every time to synchronize all partitions, which is unnecessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1484984601

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1154083727


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   if hoodie.datasource.hive_sync.partition_fixmode = true, lastCommitTimeSynced will be empty, and all hivepartitions will be obtained for comparison, and a full amount of partition metadata sync will be executed,normally it should be false
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1568912812

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "684a048842d92c5282d400e0e042680644fa8453",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "684a048842d92c5282d400e0e042680644fa8453",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947) 
   * 684a048842d92c5282d400e0e042680644fa8453 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1568996609

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943",
       "triggerID" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b33260a57b46cae12c2779fbf4279503bb277fff",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947",
       "triggerID" : "1486121696",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "684a048842d92c5282d400e0e042680644fa8453",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484",
       "triggerID" : "684a048842d92c5282d400e0e042680644fa8453",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d90cad89063e1e8248b6828cdc1521bfc7000eb0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b33260a57b46cae12c2779fbf4279503bb277fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15943) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15947) 
   * 684a048842d92c5282d400e0e042680644fa8453 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17484) 
   * d90cad89063e1e8248b6828cdc1521bfc7000eb0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on code in PR #8301:
URL: https://github.com/apache/hudi/pull/8301#discussion_r1160487484


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConfig {
       .defaultValue("")
       .withDocumentation("The spark version used when syncing with a metastore.");
 
+  public static final ConfigProperty<String> META_SYNC_PARTITION_FIXMODE = ConfigProperty
+      .key("hoodie.datasource.hive_sync.partition_fixmode")
+      .defaultValue("false")
+      .withDocumentation("Implement a full partition sync operation when partitions are lost.");

Review Comment:
   [hoodie.datasource.hive_sync.incremental] -- This name is ok. 
   
   I think the usage scenario of this parameter is:
   true :
   When it is found that the partition metadata in hms is less than the actual partition of fs, an alignment needs to be done to complete the missing partitions in hms. Only when the metadata in hms needs to be completed, it needs to be opened once. When the parameter is true, lastCommitTimeSynced will be null, it is a full alignment operation at this time.
   
   `
       Option<String> lastCommitTimeSynced = Option.empty();
       if (tableExists & !config.getBoolean(META_SYNC_PARTITION_FIXMODE)) {
         lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
       }
   
   `
   
   false:
   Under normal circumstances, lastCommitTimeSynced is used as the baseline to synchronize the newly generated partitions after lastCommitTimeSynced.
   
   So the default should probably be false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8301:
URL: https://github.com/apache/hudi/pull/8301#issuecomment-1538848390

   @weimingdiit I've landed an improvement on the Hive sync: #8388.   Could you rebase your PR on the latest master?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org