You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/04/03 19:59:26 UTC

[GitHub] [hudi] yihua commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

yihua commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1156364101


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -147,17 +129,16 @@ public class HoodieCleanConfig extends HoodieConfig {
   public static final ConfigProperty<String> FAILED_WRITES_CLEANER_POLICY = ConfigProperty
       .key("hoodie.cleaner.policy.failed.writes")
       .defaultValue(HoodieFailedWritesCleaningPolicy.EAGER.name())
+      .withEnumDocumentation(HoodieFailedWritesCleaningPolicy.class,
+          "note that LAZY policy is required when multi-writers are enabled.")

Review Comment:
   nit: capitalize the first letter.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -83,8 +79,8 @@ public class HoodieBootstrapConfig extends HoodieConfig {
   public static final ConfigProperty<String> KEYGEN_TYPE = ConfigProperty
       .key("hoodie.bootstrap.keygen.type")
       .defaultValue(KeyGeneratorType.SIMPLE.name())
-      .sinceVersion("0.9.0")
-      .withDocumentation("Type of build-in key generator, currently support SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE");
+      .withEnumDocumentation(KeyGeneratorType.class, "Key generator class for bootstrap")

Review Comment:
   For the second argument, is the convention to add a period (`.`) at the end or not?  I see both in different enum configs.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -3028,11 +2993,11 @@ private void validate() {
       Objects.requireNonNull(writeConfig.getString(BASE_PATH));
       if (writeConfig.isEarlyConflictDetectionEnable()) {
         checkArgument(writeConfig.getString(WRITE_CONCURRENCY_MODE)
-                .equalsIgnoreCase(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.value()),
+                .equals(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name()),

Review Comment:
   Same here, could we ignore case as before?



##########
hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java:
##########
@@ -139,6 +144,52 @@ public ConfigProperty<T> withDocumentation(String doc) {
     return new ConfigProperty<>(key, defaultValue, docOnDefaultValue, doc, sinceVersion, deprecatedVersion, inferFunction, validValues, advanced, alternatives);
   }
 
+  public <U extends Enum<U>> ConfigProperty<T> withEnumDocumentation(Class<U> e) {
+    return withEnumDocumentation(e,"");
+  }
+
+  private <U extends Enum<U>> boolean isDefaultField(Class<U> e, Field f) {
+    if (!hasDefaultValue()) {
+      return false;
+    }
+    if (defaultValue() instanceof String) {
+      return f.getName().equals(defaultValue());
+    }
+    return Enum.valueOf(e, f.getName()).equals(defaultValue());
+  }
+
+  public <U extends Enum<U>> ConfigProperty<T> withEnumDocumentation(Class<U> e, String doc, String... internalOption) {

Review Comment:
   Could we rename this as `withDocumentation` and remove `doc` and `internalOption` for simplicity?  `doc` content can be merged to `@EnumDescription` . We can mark internal options in the docs.



##########
hudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorWaitStrategyType.java:
##########
@@ -27,35 +30,50 @@
 /**
  * Enum for the type of waiting strategy in Disruptor Queue.
  */
+@EnumDescription("Type of waiting strategy in the Disruptor Queue")

Review Comment:
   We can keep the docs the same as before for now.  Any docs improvement can be in a separate PR.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -55,12 +52,10 @@ public class HoodieBootstrapConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> PARTITION_SELECTOR_REGEX_MODE = ConfigProperty
       .key("hoodie.bootstrap.mode.selector.regex.mode")
-      .defaultValue(METADATA_ONLY.name())
-      .sinceVersion("0.6.0")
-      .withValidValues(METADATA_ONLY.name(), FULL_RECORD.name())
-      .withDocumentation("Bootstrap mode to apply for partition paths, that match regex above. "
-          + "METADATA_ONLY will generate just skeleton base files with keys/footers, avoiding full cost of rewriting the dataset. "
-          + "FULL_RECORD will perform a full copy/rewrite of the data as a Hudi table.");

Review Comment:
   @jonvex I think @lokeshj1703 means that `avoiding full cost of rewriting the dataset` is missing in the new docs to indicate the benefit.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -83,8 +79,8 @@ public class HoodieBootstrapConfig extends HoodieConfig {
   public static final ConfigProperty<String> KEYGEN_TYPE = ConfigProperty
       .key("hoodie.bootstrap.keygen.type")
       .defaultValue(KeyGeneratorType.SIMPLE.name())
-      .sinceVersion("0.9.0")
-      .withDocumentation("Type of build-in key generator, currently support SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE");
+      .withEnumDocumentation(KeyGeneratorType.class, "Key generator class for bootstrap")

Review Comment:
   Also, should all the descriptions be in the `@EnumDescription`, instead of having a second argument here for additional description?



##########
docker/demo/config/test-suite/multi-writer-1.properties:
##########
@@ -35,7 +35,7 @@ hoodie.datasource.write.recordkey.field=_row_key
 hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
 hoodie.datasource.write.partitionpath.field=timestamp
 
-hoodie.write.concurrency.mode=optimistic_concurrency_control
+hoodie.write.concurrency.mode=OPTIMISTIC_CONCURRENCY_CONTROL

Review Comment:
   Is upper-case config value required?  Can you still ensure the lower-case value `optimistic_concurrency_control` works as before?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -194,16 +188,18 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> TIMELINE_LAYOUT_VERSION_NUM = ConfigProperty
       .key("hoodie.timeline.layout.version")
-      .defaultValue(Integer.toString(TimelineLayoutVersion.VERSION_1))
+      .defaultValue(Integer.toString(TimelineLayoutVersion.CURR_VERSION))
+      .withValidValues(Integer.toString(TimelineLayoutVersion.VERSION_0),Integer.toString(TimelineLayoutVersion.VERSION_1))
       .sinceVersion("0.5.1")
       .withDocumentation("Controls the layout of the timeline. Version 0 relied on renames, Version 1 (default) models "
           + "the timeline as an immutable log relying only on atomic writes for object storage.");
 
   public static final ConfigProperty<HoodieFileFormat> BASE_FILE_FORMAT = ConfigProperty
       .key("hoodie.table.base.file.format")
       .defaultValue(HoodieFileFormat.PARQUET)
-      .withAlternatives("hoodie.table.ro.file.format")
-      .withDocumentation("Base file format to store all the base file data.");
+      .withValidValues(HoodieFileFormat.PARQUET.name(), HoodieFileFormat.ORC.name(), HoodieFileFormat.HFILE.name())
+      .withEnumDocumentation(HoodieFileFormat.class, "File format to store all the base file data.", "HFILE")

Review Comment:
   Why do we have a third argument `"HFILE"` here?



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java:
##########
@@ -265,4 +265,5 @@ public <T> String getStringOrThrow(ConfigProperty<T> configProperty, String erro
       throw new HoodieException(errorMessage);
     }
   }
+

Review Comment:
   nit: revert the new line



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -89,24 +89,7 @@ public class HoodieCleanConfig extends HoodieConfig {
           return Option.of(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name());
         }
         return Option.empty();
-      })
-      .withDocumentation("Cleaning policy to be used. The cleaner service deletes older file "

Review Comment:
   There are new docs added to this config.  Could you move them to the right place?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -277,11 +266,9 @@ public class HoodieClusteringConfig extends HoodieConfig {
    */
   public static final ConfigProperty<String> LAYOUT_OPTIMIZE_SPATIAL_CURVE_BUILD_METHOD = ConfigProperty
       .key(LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.build.method")
-      .defaultValue("direct")
-      .sinceVersion("0.10.0")
-      .withDocumentation("Controls how data is sampled to build the space-filling curves. "
-          + "Two methods: \"direct\", \"sample\". The direct method is faster than the sampling, "
-          + "however sample method would produce a better data layout.");
+      .defaultValue(SpatialCurveCompositionStrategyType.DIRECT.name())

Review Comment:
   similar here, we should make sure both lower- and upper-case names should work.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -677,95 +664,43 @@ private String getDefaultExecutionStrategyClassName(EngineType engineType) {
   /**
    * Type of a strategy for building Z-order/Hilbert space-filling curves.
    */
+  @EnumDescription("Type of a strategy for building Z-order/Hilbert space-filling curves.")
   public enum SpatialCurveCompositionStrategyType {
-    DIRECT("direct"),
-    SAMPLE("sample");
 
-    private static final Map<String, SpatialCurveCompositionStrategyType> VALUE_TO_ENUM_MAP =
-        TypeUtils.getValueToEnumMap(SpatialCurveCompositionStrategyType.class, e -> e.value);
-
-    private final String value;
-
-    SpatialCurveCompositionStrategyType(String value) {
-      this.value = value;
-    }
+    @EnumFieldDescription("Faster than sampling, but produces a worse(?) layout. NEEDS BETTER DESCRIPTION")
+    DIRECT,
 
-    public static SpatialCurveCompositionStrategyType fromValue(String value) {
-      SpatialCurveCompositionStrategyType enumValue = VALUE_TO_ENUM_MAP.get(value);
-      if (enumValue == null) {
-        throw new HoodieException(String.format("Invalid value (%s)", value));
-      }
-
-      return enumValue;
-    }
+    @EnumFieldDescription("Slower than sampling, but produces a better(?) layout. NEEDS BETTER DESCRIPTION")
+    SAMPLE
   }
 
   /**
    * Layout optimization strategies such as Z-order/Hilbert space-curves, etc
    */
+  @EnumDescription("Determines ordering strategy for records layout optimization")
   public enum LayoutOptimizationStrategy {
-    LINEAR("linear"),
-    ZORDER("z-order"),
-    HILBERT("hilbert");
-
-    private static final Map<String, LayoutOptimizationStrategy> VALUE_TO_ENUM_MAP =
-        TypeUtils.getValueToEnumMap(LayoutOptimizationStrategy.class, e -> e.value);
-
-    private final String value;
-
-    LayoutOptimizationStrategy(String value) {
-      this.value = value;
-    }
 
-    @Nonnull
-    public static LayoutOptimizationStrategy fromValue(String value) {
-      LayoutOptimizationStrategy enumValue = VALUE_TO_ENUM_MAP.get(value);
-      if (enumValue == null) {
-        throw new HoodieException(String.format("Invalid value (%s)", value));
-      }
+    @EnumFieldDescription("Order records lexicographically.")
+    LINEAR,
 
-      return enumValue;
-    }
+    @EnumFieldDescription("Order records along Z-order spatial-curve")
+    ZORDER,
 
-    public String getValue() {
-      return value;
-    }
+    @EnumFieldDescription("Order records along Hilbert's spatial-curve")
+    HILBERT
   }
 
+  @EnumDescription("Clustering Operator ADD DESCRIPTION")

Review Comment:
   docs: `Clustering mode to use`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleaningTriggerStrategy.java:
##########
@@ -18,7 +18,14 @@
 
 package org.apache.hudi.table.action.clean;
 
+import org.apache.hudi.common.config.EnumDefault;
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
+@EnumDescription("Controls when cleaning is scheduled")
 public enum CleaningTriggerStrategy {

Review Comment:
   Let's keep it.  It is meant for extensibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org