You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "lokeshj1703 (via GitHub)" <gi...@apache.org> on 2023/03/24 09:31:07 UTC

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

lokeshj1703 commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1147219106


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java:
##########
@@ -18,18 +18,27 @@
 
 package org.apache.hudi.client.bootstrap;
 
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
 /**
  * Identifies different types of bootstrap.
  */
+@EnumDescription("Bootstrap types")
 public enum BootstrapMode {
   /**
    * In this mode, record level metadata is generated for each source record and both original record and metadata
    * for each record copied.
    */
+  @EnumFieldDescription("In this mode, record level metadata is generated for each source record and both original record and metadata "
+      + "for each record copied.")
   FULL_RECORD,
 
   /**
    * In this mode, record level metadata alone is generated for each source record and stored in new bootstrap location.
    */
+  @EnumDefault

Review Comment:
   I think default enum should be specified as parameter of `enumDefaultStringValueAndDocumentation` since enum could be part of many configs with different default values.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -55,12 +52,10 @@ public class HoodieBootstrapConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> PARTITION_SELECTOR_REGEX_MODE = ConfigProperty
       .key("hoodie.bootstrap.mode.selector.regex.mode")
-      .defaultValue(METADATA_ONLY.name())
-      .sinceVersion("0.6.0")
-      .withValidValues(METADATA_ONLY.name(), FULL_RECORD.name())
-      .withDocumentation("Bootstrap mode to apply for partition paths, that match regex above. "
-          + "METADATA_ONLY will generate just skeleton base files with keys/footers, avoiding full cost of rewriting the dataset. "
-          + "FULL_RECORD will perform a full copy/rewrite of the data as a Hudi table.");

Review Comment:
   The enum field description is missing some of the documentation mentioned here? Is this intentional?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -2929,12 +2892,12 @@ private void autoAdjustConfigsForConcurrencyMode(boolean isLockProviderPropertyS
             // If user does not set the lock provider, likely that the concurrency mode is not set either
             // Override the configs for metadata table
             writeConfig.setValue(WRITE_CONCURRENCY_MODE.key(),
-                WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.value());

Review Comment:
   This would be a breaking change right? We are changing the possible config values.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -1912,6 +1871,10 @@ public Option<HoodieLogBlock.HoodieLogBlockType> getLogDataBlockFormat() {
         .map(HoodieLogBlock.HoodieLogBlockType::fromId);
   }
 
+  public void testEnums() {

Review Comment:
   Can we add a test to `TestConfigProperty` which does a string comparison for documentation generated?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java:
##########
@@ -18,18 +18,27 @@
 
 package org.apache.hudi.client.bootstrap;
 
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
 /**
  * Identifies different types of bootstrap.
  */
+@EnumDescription("Bootstrap types")

Review Comment:
   Do we need to add more information here?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -64,25 +62,20 @@ public class HoodieCleanConfig extends HoodieConfig {
       .withDocumentation("Number of commits to retain, without cleaning. This will be retained for num_of_commits * time_between_commits "
           + "(scheduled). This also directly translates into how much data retention the table supports for incremental queries.");
 
-  public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = ConfigProperty.key("hoodie.cleaner.hours.retained")
+  public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = ConfigProperty
+      .key("hoodie.cleaner.hours.retained")
       .defaultValue("24")
       .withDocumentation("Number of hours for which commits need to be retained. This config provides a more flexible option as"
           + "compared to number of commits retained for cleaning service. Setting this property ensures all the files, but the latest in a file group,"
           + " corresponding to commits with commit times older than the configured number of hours to be retained are cleaned.");
 
   public static final ConfigProperty<String> CLEANER_POLICY = ConfigProperty
       .key("hoodie.cleaner.policy")
-      .defaultValue(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
-      .withDocumentation("Cleaning policy to be used. The cleaner service deletes older file slices files to re-claim space."
-          + " By default, cleaner spares the file slices written by the last N commits, determined by  " + CLEANER_COMMITS_RETAINED.key()
-          + " Long running query plans may often refer to older file slices and will break if those are cleaned, before the query has had"
-          + "   a chance to run. So, it is good to make sure that the data is retained for more than the maximum query execution time");

Review Comment:
   Similar to above this is not mentioned in new documentation. Can we check the documentation for other configs as well? Or specify the intentional changes through github comments?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -692,9 +687,17 @@ public static SpatialCurveCompositionStrategyType fromValue(String value) {
   /**
    * Layout optimization strategies such as Z-order/Hilbert space-curves, etc
    */
+  @EnumDescription("Determines ordering strategy for records layout optimization")
   public enum LayoutOptimizationStrategy {

Review Comment:
   Do you think approach like `ErrorWriteFailureStrategy` could be helpful here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org