You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/19 09:48:08 UTC

[GitHub] [hudi] xushiyan commented on a diff in pull request #6899: [HUDI-4998] Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work with META_SYNC_PARTITION_FIELDS

xushiyan commented on code in PR #6899:
URL: https://github.com/apache/hudi/pull/6899#discussion_r999206235


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   i think it does make sense to infer from META_SYNC_PARTITION_FIELDS as users can manual set it, which should align with the partition extractor. so can you make it the 1st, then fallback to table, and then key gen



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")
       .withInferFunction(cfg -> {
-        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(HoodieTableConfig.PARTITION_FIELDS))
-            .or(() -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
-        if (!partitionFieldsOpt.isPresent()) {
+        Option<String> partitionFieldsOpt = Option.ofNullable(cfg.getString(META_SYNC_PARTITION_FIELDS));

Review Comment:
   the problem is when instantiating the hoodie sync config we cannot guarantee META_SYNC_PARTITION_FIELDS is inferred before META_SYNC_PARTITION_EXTRACTOR_CLASS. so we want to still check the table and key gen configs.



##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -94,28 +94,22 @@ public class HoodieSyncConfig extends HoodieConfig {
 
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
-      .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+      .defaultValue("org.apache.hudi.hive.NonPartitionedExtractor")

Review Comment:
   @xicm the convention is we don't change default value unless there is a strong reason. we changed slash encoded extractor to multi-part extractor in 0.12.0 for handling usecases better. we don't want to change so often. besides, `MultiPartKeysValueExtractor` handles non partition case right? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org