You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/26 04:43:02 UTC

[GitHub] [hudi] xushiyan opened a new pull request, #6217: [HUDI-4474] Infer metasync configs

xushiyan opened a new pull request, #6217:
URL: https://github.com/apache/hudi/pull/6217

   - infer repeated sync configs from original configs
     - `META_SYNC_BASE_FILE_FORMAT`
       - infer from `org.apache.hudi.common.table.HoodieTableConfig.BASE_FILE_FORMAT`
     - `META_SYNC_ASSUME_DATE_PARTITION`
       - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ASSUME_DATE_PARTITIONING`
     - `META_SYNC_DECODE_PARTITION`
       - infer from `org.apache.hudi.common.table.HoodieTableConfig.URL_ENCODE_PARTITIONING`
     - `META_SYNC_USE_FILE_LISTING_FROM_METADATA`
       - infer from `org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE`
   
   As proposed in https://github.com/apache/hudi/blob/master/rfc/rfc-55/rfc-55.md#compatible-changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
codope merged PR #6217:
URL: https://github.com/apache/hudi/pull/6217


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6217:
URL: https://github.com/apache/hudi/pull/6217#discussion_r929518923


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -426,12 +426,8 @@ object DataSourceWriteOptions {
   @Deprecated
   val METASTORE_URIS: ConfigProperty[String] = HiveSyncConfigHolder.METASTORE_URIS
   @Deprecated
-  val hivePartitionFieldsInferFunc: JavaFunction[HoodieConfig, Option[String]] = HoodieSyncConfig.PARTITION_FIELDS_INFERENCE_FUNCTION
-  @Deprecated
   val HIVE_PARTITION_FIELDS: ConfigProperty[String] = HoodieSyncConfig.META_SYNC_PARTITION_FIELDS
   @Deprecated
-  val hivePartitionExtractorInferFunc: JavaFunction[HoodieConfig, Option[String]] = HoodieSyncConfig.PARTITION_EXTRACTOR_CLASS_FUNCTION

Review Comment:
   these infer functions shouldn't be here for users to import in the first place



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6217:
URL: https://github.com/apache/hudi/pull/6217#discussion_r929519658


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -72,57 +73,51 @@ public class HoodieSyncConfig extends HoodieConfig {
   public static final ConfigProperty<String> META_SYNC_BASE_FILE_FORMAT = ConfigProperty
       .key("hoodie.datasource.hive_sync.base_file_format")
       .defaultValue("PARQUET")
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HoodieTableConfig.BASE_FILE_FORMAT)))
       .withDocumentation("Base file format for the sync.");
 
-  // If partition fields are not explicitly provided, obtain from the KeyGeneration Configs
-  public static final Function<HoodieConfig, Option<String>> PARTITION_FIELDS_INFERENCE_FUNCTION = cfg -> {
-    if (cfg.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)) {
-      return Option.of(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME));
-    } else {
-      return Option.empty();
-    }
-  };
   public static final ConfigProperty<String> META_SYNC_PARTITION_FIELDS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_fields")
       .defaultValue("")
-      .withInferFunction(PARTITION_FIELDS_INFERENCE_FUNCTION)
+      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)))
       .withDocumentation("Field in the table to use for determining hive partition columns.");
 
-  // If partition value extraction class is not explicitly provided, configure based on the partition fields.
-  public static final Function<HoodieConfig, Option<String>> PARTITION_EXTRACTOR_CLASS_FUNCTION = cfg -> {
-    if (!cfg.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)) {
-      return Option.of("org.apache.hudi.hive.NonPartitionedExtractor");
-    } else {
-      int numOfPartFields = cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME).split(",").length;
-      if (numOfPartFields == 1
-          && cfg.contains(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE)
-          && cfg.getString(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE).equals("true")) {
-        return Option.of("org.apache.hudi.hive.HiveStylePartitionValueExtractor");
-      } else {
-        return Option.of("org.apache.hudi.hive.MultiPartKeysValueExtractor");
-      }
-    }
-  };
   public static final ConfigProperty<String> META_SYNC_PARTITION_EXTRACTOR_CLASS = ConfigProperty
       .key("hoodie.datasource.hive_sync.partition_extractor_class")
       .defaultValue("org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor")
-      .withInferFunction(PARTITION_EXTRACTOR_CLASS_FUNCTION)
+      .withInferFunction(cfg -> {
+        if (cfg.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)) {
+          int numOfPartFields = cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME).split(",").length;
+          if (numOfPartFields == 1
+              && cfg.contains(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE)
+              && cfg.getString(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE).equals("true")) {
+            return Option.of("org.apache.hudi.hive.HiveStylePartitionValueExtractor");
+          } else {
+            return Option.of("org.apache.hudi.hive.MultiPartKeysValueExtractor");
+          }
+        } else {
+          return Option.of("org.apache.hudi.hive.NonPartitionedExtractor");
+        }
+      })
       .withDocumentation("Class which implements PartitionValueExtractor to extract the partition values, "
           + "default 'SlashEncodedDayPartitionValueExtractor'.");
 
   public static final ConfigProperty<String> META_SYNC_ASSUME_DATE_PARTITION = ConfigProperty
       .key("hoodie.datasource.hive_sync.assume_date_partitioning")
-      .defaultValue("false")
-      .withDocumentation("Assume partitioning is yyyy/mm/dd");
+      .defaultValue(HoodieMetadataConfig.ASSUME_DATE_PARTITIONING.defaultValue())

Review Comment:
   also "false"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6217:
URL: https://github.com/apache/hudi/pull/6217#issuecomment-1195070655

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347",
       "triggerID" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "triggerType" : "PUSH"
     }, {
       "hash" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10350",
       "triggerID" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f107f6da44514d13952316ac10675ff0de4ca526 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347) 
   * 38dc56a8c8dde304d7dbabbe1f6206df71954dfe Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10350) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6217:
URL: https://github.com/apache/hudi/pull/6217#issuecomment-1195024795

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347",
       "triggerID" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f107f6da44514d13952316ac10675ff0de4ca526 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6217:
URL: https://github.com/apache/hudi/pull/6217#issuecomment-1195033108

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347",
       "triggerID" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "triggerType" : "PUSH"
     }, {
       "hash" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f107f6da44514d13952316ac10675ff0de4ca526 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347) 
   * 38dc56a8c8dde304d7dbabbe1f6206df71954dfe UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6217:
URL: https://github.com/apache/hudi/pull/6217#issuecomment-1195021829

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f107f6da44514d13952316ac10675ff0de4ca526 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6217: [HUDI-4474] Infer metasync configs

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6217:
URL: https://github.com/apache/hudi/pull/6217#issuecomment-1195198086

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10347",
       "triggerID" : "f107f6da44514d13952316ac10675ff0de4ca526",
       "triggerType" : "PUSH"
     }, {
       "hash" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10350",
       "triggerID" : "38dc56a8c8dde304d7dbabbe1f6206df71954dfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 38dc56a8c8dde304d7dbabbe1f6206df71954dfe Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10350) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org