You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/03/13 06:24:25 UTC

[GitHub] [hudi] yihua commented on a diff in pull request #8128: [HUDI-5782] Tweak defaults and remove unnecessary configs after config review

yihua commented on code in PR #8128:
URL: https://github.com/apache/hudi/pull/8128#discussion_r1133478509


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java:
##########
@@ -69,6 +69,10 @@ public class HoodieIndexConfig extends HoodieConfig {
 
   private static final Logger LOG = LogManager.getLogger(HoodieIndexConfig.class);
 
+  public static final boolean DEFAULT_BLOOM_INDEX_UPDATE_PARTITION_PATH = true;
+  public static final boolean DEFAULT_SIMPLE_INDEX_UPDATE_PARTITION_PATH = true;
+
+

Review Comment:
   We don't need the variables and we can assume that the default behavior reflects these.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -288,9 +288,6 @@ private HoodieWriteConfig createMetadataWriteConfig(
         .withCompactionConfig(HoodieCompactionConfig.newBuilder()
             .withInlineCompaction(false)
             .withMaxNumDeltaCommitsBeforeCompaction(writeConfig.getMetadataCompactDeltaCommitMax())
-            // by default, the HFile does not keep the metadata fields, set up as false
-            // to always use the metadata of the new record.
-            .withPreserveCommitMetadata(false)

Review Comment:
   Have you audited the code path to see there's no behavior change for metadata writing?



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -199,13 +199,8 @@ private static Stream<Arguments> populateMetaFieldsParams() {
     return Arrays.stream(new Boolean[][] {{true}, {false}}).map(Arguments::of);
   }
 
-  private static Stream<Arguments> populateMetaFieldsAndPreserveMetadataParams() {
-    return Arrays.stream(new Boolean[][] {
-        {true, true},
-        {false, true},
-        {true, false},
-        {false, false}
-    }).map(Arguments::of);
+  private static Stream<Arguments> populateMetaFields() {

Review Comment:
   nit: rename to `populateMetaFieldsParams`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -48,6 +48,8 @@
         + "which optimizes the storage layout for better query performance by sorting and sizing data files.")
 public class HoodieClusteringConfig extends HoodieConfig {
 
+  public static final boolean DEFAULT_PRESERVE_COMMIT_METADATA = true;

Review Comment:
   We don't need the variable and we can assume that the default behavior reflects this.  Similar for the compaction.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -297,21 +295,15 @@ private FlinkOptions() {
       .booleanType()
       .defaultValue(false)// default read as batch
       .withDescription("Whether to skip compaction instants and avoid reading compacted base files for streaming read to improve read performance.\n"
-          + "There are two cases that this option can be used to avoid reading duplicates:\n"
-          + "1) you are definitely sure that the consumer reads [faster than/completes before] any compaction instants "
-          + "when " + HoodieCompactionConfig.PRESERVE_COMMIT_METADATA.key() + " is set to false.\n"
-          + "2) changelog mode is enabled, this option is a solution to keep data integrity");
+          + "This option can be used to avoid reading duplicates when changelog mode is enabled, it is a solution to keep data integrity\n");

Review Comment:
   @danny0405 compaction always preserves the commit metadata by default and now we make it the only behavior to avoid corruption.  I assume this is ok for Flink too.  Could you confirm it?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringOperator.java:
##########
@@ -133,7 +133,7 @@ public class ClusteringOperator extends TableStreamOperator<ClusteringCommitEven
 
   public ClusteringOperator(Configuration conf, RowType rowType) {
     this.conf = conf;
-    this.preserveHoodieMetadata = conf.getBoolean(HoodieClusteringConfig.PRESERVE_COMMIT_METADATA.key(), HoodieClusteringConfig.PRESERVE_COMMIT_METADATA.defaultValue());
+    this.preserveHoodieMetadata = HoodieClusteringConfig.DEFAULT_PRESERVE_COMMIT_METADATA;

Review Comment:
   nit: we don't need this and we can simplify `this.rowType` below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org